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ABSTRACT 

An item generation procedure is described vhich iras 
utilized in the development of Computer Hanaged Beviev and 
Examination courses for the education of nurses in remote areas* The 
major emphases are the processes of domain definition, item vritlng, 
and item edition* Specific discussion is presented concerning methods 
of item construction to assess technical vocabulary, concept 
learning, and the application of nursing principles to the solution 
of problems. The entire test construction procedure is briefly 
r^vieved; this procedure includes numerous quality checks to insure 
the production of both high caliber instructional materials and 
domain- referenced tests. The criteria used at various editing and 
reTieir stages are mentioned. An initial evaluation of the items is 
made, and problems inherent in the item generation procedure are 
offered. (Author/JY) 
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Abstract 



This paper describes the item generation procedure 
utilized in the development of Computer Managed Review 
and Examination courses for the education of nurses in 
remote areas. The major emphases are the processes of 
domain definition, item writing, and item editing. 
Specific discussion is presented concerning methods of 
item construction to assess technical vocabulary, con- 
cept learning, and the application of nursing principles 
to the solution of problems. The entire test construction 
procedure is viewed; this procedure includes numerous 
quality checks to insure the production of both high 
calibre instructional materials and domain-referenced 
tests. The criteria used at the various editing and 
review stages are mentioned. An initial evaluation of 
the items is made and problems inherent in the item 
generation procedure are offered. 

Since the mid to late 1960's, traditional achievement testing has 
been the subject of considerable criticism and innovation. Glaser 
(1963), Bormuth (1970), and other measurement experts have strongly en- 
couraged the educational corranunity to re-evaluate the testing pro- 
cedures used in instructional programs. The construction and selection 
of achievement test items, in particular, has been a focus of attention. 

The problematic nature of achievement test and item construction 

rises to even greater prominence in Computer Managed Instruction (CMI) 

and other individualized instruction programs (Anderson et al.: 1974, 
s 

Presented at the 1976 AERA Annual Meeting, San Francisco, 
California, April, 1976. 
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Mitzel, 1974). As Robert 6agne'^(l 975» p. 145) has said, "To the extent 
that modern educational trends* at all levels, favor self-education and 
independent learning* the means of observing and assessing the outcomes 
of learning becomes a matter of considerable importance." 

The importance of item construction is exonplified in the fact 
that this CMRE project involves the development of eight nursing courses, 
each of which necessitates the construction of approxlmatoly 1,100 
items\ for a project total of almost 9>000 items! 

Typically, testing in education serves a single prime purpose: to 
accredit or certify competence. Within the CMRE model of this project, 
however, four distinct purposes can be delineated for testing. First, 
the initial test serves a placement function. Then, the review questions 
throughout the instructional program serve two functions; to diagnose 
student learning and prescribe remedial Instructional materials (and 
hence, to keep students from taking tests before they are adequately 
prepared for them) and to maintain student interest, motivation* and 
attentiveness- After instruction has been completed, the final exam- 
ination serves the traditional credit awarding function* 

Achievement measures are evaluated in terms of content validity. 
Most authors in the field of measurement (Cronbach, 1971) recognize 
that content validity is assessed largely with respect to the degree 

^Of these 1 »100 items, approximately 700 are test items and 400 
are for review* Although the differences between these items is often 
not large* they are kept as two separate Item pools. Since review items 
follow instruction more closely chronologically* we have tended to use 
these Items with more specificity, 

3 

ERIC 



3 



to which logical and systematic test construction procedures are utilized. 
The prime goal of this paper is to state the test construction pro- 
cedures employed explicitly. Additionally, many individuals (Baker, 1974; 
Ferguson, 1972; Glaser, 1970; Hambleton, 1974; Nitko, 1974; Willingham 
and Geisinger, 1976) have been interested in the degree of parellelism 
between educational measurement practices and the other components of 
instructional systems. Only through such explicit statements of pro- 
cedures can procedural inconsistencies ("working at cross-purposes") 
be identified and removed. 

Developing the CHRE Courses 

Staffing 

Individual faculty members from the College of Human Development's 
Department of Nursing were assigned as course authors for each of the 
eight CMRE courses. Additionally, a nurse-research assistant was hired 
one-half time to assist each author. While these assistants aided the 
authors In reviewing academic materials for use "off-line," their major 
role was that of Item writing. A professor from the Department of 
Educational Psychology conducted several item writing workshops for 
these individuals. Two graduate students In Educational Psychology 
served as principal Item reviewers or editors, and wrote items upon 
occasion. 

The Course Development Process 

The following eleven step course development sequence was utilized 

In the preparation of all eight courses to facilitate progress and to 
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keep track of the location of Items within the organization. 

1. Instructional Material Developed 

2. Initial Test Item Construction 

3. Initial Item Review 

4. First Item Revisions Made 

5. Item Typing on Paper 

6. Second Item Review 

7. Second Item Revisions Hade 

8. Magnetic Tape Selectric Typing 

9. Course Author Item Approval Granted 

10. Magnetic Tape Corrected 

11. On-line Review and Revision 

Instructional Material Developed 

The development of instructional materials for these courses Is 

primarily a two-step process. Each author first enumerates his major 
goals for the course. From this list of goals, a detailed subject- 
matter outline is constructed. Using this outline, a committee of the 
Department of Nursing then judges the adequacy of course coverage. The 
second step involves the operational Izatlon of the outline; appropriate 
materials (texts, articles, films, tapes, etc.) are developed or 
selected to represent the topic areas listed in the outline. (This 
procedure is crucial for the test construction process; Appendix A 
describes how the instructional material selection and development 
relates to current measurement topics such as universe and domain 
definition, domain-referenced testing, and criterion-referenced testing.) 
Then, this body of curricular information Is divided into single study 
session-sized segments called lessons. Each lesson Is weighted ac- 
cording to its importance, this weighted importance being directly pro- 
portional to the eventual nimiber of items for that lesson. 

The Use of Sunmary Statements 

Consistent with the procedures used at the University of Illinois' 
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CAICMS project, subject matter experts devise summary statements from the 
instructional material (Wietecha and Anderson, 1974), Summary state- 
ments are abstractions of the major themes within instruction and of 
those elements of critical importance within each lesson, A single 
lesson might have as many as twenty to twenty-five such statements. 
These summary statements are written verbatim from the textual material 
on which the lesson is based. The specific subject and predicate are 
unchanged from the exact wording of the text; the suimnary statements 
are kept as consistent with the language of the text as possible. The 
length of the sumnary statements ranges from a single sentence to a 
short paragraph.^ Each summary statement is referenced by module, 
lesson, and page of text or article within the lesson. 

Writing the Test Items 

After attending the intensive objective item-writing workshops, 

the subject matter experts involved in each course construct the bulk 
of the questions for that particular course, (By subject matter 
experts, both course authors and course assistants are indicated.) 
Each item is constructed from a summary statement. Like each suimnary 
statement, each item is referenced to the page of the written in- 
struction from whence it came. This is used later in the formation of 
diagnostic-prescriptive statements for the examinees, 

^In the interest of Increasing efficiency, after the first year 
of the project, some course authors have dispensed with the use of 
summary statements. In their place, the authors "highlight" or under- 
line those statements in the instructional materials themselves. This, 
of course, saves the time of copying the statement verbatim from the 
book. Furthermore, it permits the author to select those topics on 
which he desires items, while allowing the course assistant to actually 
construct the items from the underlined passages. This, too, represents 
a time-savlng. 




Basically, four distinct types of items typify all the Items used 
on the tests. The first is used If the summary statement Includes a 
specific term or name which the subject matter expert believes the 
student should be able to recall (rather than recognize.) A constructed 
response item, in which the student is required to type in the term, is 
used. In this case, however, the stem of the item is altered from the 
verbatim summary statement through the use of paraphrasing. Sometimes, 
grajnmatical transformations are also performed on the paraphrased 
summary statement. This Insures that the item is semantlcally encoded 
in memory, not merely orthographically or phonological ly encoded (see 
Anderson, 1972, for an in depth explanation of this point.) Item writers 
must be reminded often that this type of Item Is only appropriate for 
specific words or phrases. It is a relatively easy kind of Item to 
construct because there is no need for distractors; it Is simply not 
justified for testing those general concepts with many synonyms. 
Generally, no more than two or three synonyms are allowed to be keyed 
as correct. Approximately ten per cent of our Items are of this variety. 

A second type of item assesses the ability to anploy nursing 
principles. Generally, such a principle recommends a course of action 
to the nursing student which is appropriate under certain circumstances. 
The stem of such a question represents one such specific circumstance. 
The nurse Is asked what to do. The nurse must correctly apply the 
principle to this new situation. Unless, as is infrequently the case, 
the desired response Is embodied by a specific term, a multiple-choice 
format is used, with a number of possible actions listed as options. 
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Preferably* each such option represents a different orientation or 
principle. All options are mutually exclusive. The nurse must select 
which action is best. This type of item may also be used for com- 
putational problems, (For example* calculations of proper dosage are 
important in some nursing courses.) Most often, the examinee must type 
in the correct response in such computations. This controls for the 
possibility that the examinee could "work back" from the options to 
discover the correct answer. Any problem* either verbal or numerical, 
presented to the nurse is new, different from any examples given as 
part of instruction. This insures that the student must determine the 
answer by applying the principle, rather than answering from rote. 
The Illinois CAICMS project referred to these principle-testing items as 
application items (Uletecha and Anderson, 1975). Both titles seem 
equally appropriate. Principle-testing questions account for ap- 
proximately fifteen percent of the CMRE nursing questions. 

The third veriety of test items tests the student's mastery of 
a concept. Most often, the student is presented with a number of 
examples. Here, the student must choose which of the examples are 
instances (examples) of the concept. These items always Include both 
positive and negative instances, thus forcing the student to perform 
a tijscrimi nation in the demonstration of his mastery. Options are 
generally not mentioned in the text, but newly constructed. This helps 
insure that the student has learned the concept, not just memorized 
those instances as used in the instructional material. About twenty-five 
to thirty percent of the items are of this variety. 
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The final and most frequently occurring type of item does not 

specifically assess concepts* principles* or terms* as do the previous 

item types. Rather, this type of item is simply a paraphrase of the 

sumnary statement, with an element, usually the subject, deleted. The 

task of the student is to recognize a paraphrase of the deleted part 

among the options. In some cases, when the subject is important in 

its own rightji, but the subject matter expert does not feel the nurse 

must recall the term specifically, the words are listed verbatim as 

options* (An example of this would be the titles of each of the eight 

stages of development in Erik Erikson's theory.) Sometimes the predicate 

is tested rather than the verb; an effort is made to test the most 

important aspects of the summary statements* 

Item Writing Rules 

The project attempted to avoid absolute rules concerning item 

writing. Several such rules did emerge, however. True-false questions 

are not permitted. Few concepts or principles are ever purely true or 

false. That such items are correctly guessed quite frequently argued 

against inclusion of either true-false questions or multiple choice 

questions with only two or three options. Use of the options **none of 

the above,** '*all of the above,** and combination responses (i^e., 

and c) are not allowed. Use of such words as "always,** **only,* and 

"never** In options are avoided, as are other **spec1fic detetniiners** 

or extraneous clues (Davis and Diamond, 1974). Questions aimed at 

tricking the student, or forcing him to make overly fine discriminations, 

are discouraged. 
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Item Format 

As mentioned above, most items used are either multiple choice 
or short answer constructed-response format. In general, the rule 
determining which of these is used concerns the necessity of the student' 
being able to recollect the specific terra or answer. As mentioned pre- 
viously, two separate pools of items are kept: test items and review 
items. There tends to be a larger proportion of constructed response 
review items than test items. Hatching questions are not used as test 
questions because of scoring problems. However, since one of the goals 
of review questions is to maintain student interest and attentiveness 
and because students tend to enjoy such items, matching items are used 
in review sections. 

One of the advantages of the computer system is the use of 

multiple choice items which have more than one option as correct. These 

are used primarily as items testing a student's understanding of a 

concept. Special instructions concerning the student's response 

accompany these items. A typical such item would be "Which of the 

following are symptoms of pneumonia? Select one or more correct 

answer." Clearly, a student is less likely to answer this type of 

question correctly by guessing. 

Editing the Test Items 

Two graduate students in educational psychology serve as item 

editors: one performs the first editing, the other, the second. In 

reading each item, one of the following four judgments are made, and 

then the item is returned to the author. 
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1. "The item is fine, acceptable as it is." 

2. "The item has problem 'X', here is a revision. How does 
that seem to you?" 

3* "The item has problem *X', I suggest you make the 

following changes. 
4. "The item is pcor for the following reason{s). I 

sv^gest you start over again with the summary statement. 

Write another item." 
The prime job of these editors is to analyze the items according to 
accepted rules of good objective Item writing {Davis and Diamond, 
1974; Tinkelman, 1971; Wesman, 1971; Wood, 1961). The item editors 
also re-paraphrase items to make them more straightforward, clear, and 
less reliant on the vocabulary of the text or Instructional material. 
Frequently, a^ter such item revisions, an Item iterates between author 
and editor several times before both individuals are satisfied that 
the Item is of acceptable content and form. The item editor also 
attempts to analyze the examinee response called for and attempts to 
determine whether this is congruent With the purpose of the Item. If, 
for example, the item writer requires examinees to select the name of 
an appropriate drug from a list of five, the goal of such an item may 
be better served by a question of the constructed response format. 
On the other hand, if the item writer wishes the examinee to respond 
with a general concept, est^^clally if that concept is referred to by 
various synonyms, a multiple choice format is called for. 
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The retyping of items between the two editorial processes helps to 
keep the judgments independent. The use of two editors is primarily in 
the interest of quality control - 

Before each item is placed "on-line/* the course author nwkes a 
final item approval. This allows the author to view all the items of a 
module or lesson at a single time and to make more global formative 
recommendations. Once "on-line/* a member of the project staff checks 
each item^ insuring that the item has been correctly keyed and the 
progranmjing operable. These checks prevent faulty material from being 
sent to a mobile instruction site. 

Evaluation of the Test Construction Process 

Face Validity 

Face validity refers to how well the items appear to be measuring 
the subject matter. Test construction experts have tended to consider 
face validity only to the extent to which it is needed to sell a test. 
Face validity has heightened importance for CMRE questions. The reason 
for this is that the students' prime interaction with the instructional 
system is in answering the questions. If the students perceive the 
questions as being trivial or irrelevant^ they will lose respect for the 
potential usefulness or importance of the instruction. For these 
reasons^ the relatively high number of realistic problems included in 
the examinations for the student to solve appears an extremely favorable 
quality. Furthermore^ because the system follows a diagnostic-pre- 
scriptive models the student is not simply told he has failed; he is 
told in what aspects of a lesson he needs further study. 
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Content Validity 

As mentioned earlier, content validity is largely assessed in 

viewing the test construction process systematically, and judging how 
adequately the test items represent the domain* The domain has been 
carefully defined and summary statements have been made extremely con- 
sistent with the instructional materials. Then, the statements are para 
phrased and often grammatically altered such that a student must com- 
prehend the instructional material to answer it correctly. Quality 
control Is assured In that all Items are read several times by several 
different people before the Items go "on-line." A faculty committee 
of the Department has evaluated all course outlines as adequate 
representations of the subject matter. Furthermore, nationally known 
nursing experts are being brought to Penn State to evaluate the CMRE 
courses. 
Problems 

The test construction process appears to be largely successful. 
However, several problems do appear worthy of mention. 

1» The fact that the item editors were largely unable to 
make nursing-related statements leads to inefficiency: 
this Is especially troublesome In the attempt to 
generate plausible alternatives for the mulitple 
choice Items . 

2. Even with the utilization of carefully chosen 
off*line instructional material, academic 
idiosyncracies on the part of the text authors 
are found. In constructing questions to test 
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such material, the item writer is forced to preface 
the item with, "According to , , One of the goals 
of CKRE Is to allow students who have previously 
learned material to bypass the instruction on it a 
second time. Material which Is textbook - or author- 
specific is not conducive for this purpose. 

3. When starting with summary statements, a writer finds 
that he can generate a considerable number of 

items from a single summary statement. Selecting 
which item Is best Is an extremely unscientific 
process. This is especially difficult when the 
different items appear to have widely different 
levels of difficulty. 

4. The procedure of determining what an item assesses 
(a concept, a principle, a term, etc.) Is a highly 
subjective, mentalistic process, subject to dis* 
agreement among item writers. 

These are problems requiring practical solutions. As Increasing 
numbers of CMI and CMRE projects are developed, we hope that such 
assessment problems can be solved or handled in a better manner. Of 
course, the ultimate beneficiaries of such solutions are not the 
future CMI developers, but the future students (Popham, 1974). 
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Instructional Development for Domain-Referenced Testing 

Currently, considerable inconsistency in educational measurement 
vocabulary exists, especially with respect to criterion-referenced 
testing (Alkin, 1974; Donlon, 1974; Hillman, 1973, 1974). These test 
construction experts have argued that the term, criterion-referenced 
testing, should refer only to those tests where items are referenced to 
either behavioral objectives or amplified behavioral objectives. On the 
other hand, a domain-referenced test is "any test consisting of a random 
or stratified sample of items selected from a well-defined set or class 
of tasks (a domain}" (Millman, 1974)^. On such a measure, each examinee 
is measured to discover the degree to which he has attained the intents 
of instruction and not to see how he compares with other examinees with 
respect to his capacity to learn the instructional material. Millman 
further argues that such domain-referenced tests yield scores which are 
unbiased estimates of the percentage of all Items within a domain 
mastered, written or unwritten. Such scores are extremely desirable 
for both placement and crediting decisions, and for insuring content 
validity. With a well-defined domain, there should be high agreement 
among experts as to what constitutes membership within the domain; 
Shoemaker (1975) has argued that a "universe" of all knowledge within 
an academic discipline must become operationally defined as a domain. 
In this CMRE project, the boundaries of the item domain have been so 
operationalized, as the set of "off-line" instructional materials: 
texts, articles, pages » films, tapes, etc.) 

^Whereas it is popular, currently, to have criterion-referenced 
tests in instructional projects, Ebel (1970, p. 5) has denwnstrated that 
"in areas where the emphasis is on knowledge and understanding, the 
effective use of criterion-referenced measures seems less likely." 
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